Deep Belief Nets as Function Approximators for Reinforcement Learning
نویسندگان
چکیده
We describe a continuous state/action reinforcement learning method which uses deep belief networks (DBNs) in conjunction with a value function-based reinforcement learning algorithm to learn effective control policies. Our approach is to first learn a model of the state-action space from data in an unsupervised pretraining phase, and then use neural-fitted Q-iteration (NFQ) to learn an accurate value function approximator (analogous to a “fine-tuning” phase when training DBNs for classification). Our experiments suggest that this approach has the potential to significantly increase the efficiency of the learning process in NFQ, provided care is taken to ensure the initial data covers interesting areas of the state-action space, and may be particularly useful in transfer learning settings.
منابع مشابه
Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments
In many reinforcement learning problems, parameters of the model may vary with its phase while the agent attempts to learn through its interaction with the environment. For example, an autonomous car’s reward on selecting a path may depend on traffic conditions at the time of the day or the transition dynamics of a drone may depend on the current wind direction. Many such processes exhibit a cy...
متن کاملOn Discontinuous Q-Functions in Reinforcment Learning
This paper considers the application of reinforcement learning to path nding tasks in continuous state space in the presence of obstacles. We show that cumulative evaluation functions (as Q-Functions 28] and V-Functions 4]) may be discontinuous if forbidden regions (as implied by obstacles) exist in state space. As the innnite number of states requires the use of function approximators such as ...
متن کاملBatch mode reinforcement learning based on the synthesis of artificial trajectories
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value functi...
متن کاملDeep Reinforcement Learning for Robotic Manipulation - The state of the art
The focus of this work is to enumerate the various approaches and algorithms that center around application of reinforcement learning in robotic manipulation tasks. Earlier methods utilized specialized policy representations and human demonstrations to constrict the policy. Such methods worked well with continuous state and policy space of robots but failed to come up with generalized policies....
متن کاملDeep Belief Networks Are Compact Universal Approximators
Deep Belief Networks (DBN) are generative models with many layers of hidden causal variables, recently introduced by Hinton et al. (2006), along with a greedy layer-wise unsupervised learning algorithm. Building on Le Roux and Bengio (2008) and Sutskever and Hinton (2008), we show that deep but narrow generative networks do not require more parameters than shallow ones to achieve universal appr...
متن کامل